feat: copy_behaviors to make sub-classing easy #3137

Saransh-cpp · 2024-05-30T20:01:49Z

XRef #2433

lgray · 2024-06-18T13:05:17Z

Any update on the tests for this PR?

Saransh-cpp · 2024-06-20T13:31:08Z

Hi, this already has tests! I think I forgot to request a review.

jpivarski

Be sure to update the rapidjson submodule in this PR so that the PR doesn't try to change it:

src/awkward/_util.py

jpivarski

I edited this PR to remove the continue. (There was a continue in the previous function, unique_list, because it was following a memoization pattern that should fail early with minimal fanfare if an item is in the memo. The copy_behaviors function is different: not isinstance(key, str) and "*" not in key is part of the basic logic.)

Along the way, I noticed something more dangerous: copy_behaviors modifies the behavior in place, and if there's an error part-way through, it will leave the behavior in an intermediate state, which is likely invalid. At least it should not apply its changes until there's no chance of failure. But, better still, it can return new behaviors for the caller to update, rather than change them in place. This is a different API:

ak.behavior.update(ak._util.copy_behaviors(existing_class, new_class, ak.behavior)

The fact that the ak.behavior has to be mentioned twice is a feature: maybe the user wants the new behaviors in some different dict.

I'm still confused about why these two cases are excluded:

key is a single string, such as existing_class.__name__
key includes a "*", which indicates that a class may be inside of nested lists.

As it is, copy_behaviors would copy everything except the ak.Record subclass (e.g. SomeObject with methods) and its vectorized verions (e.g. ArrayOfSomeObject with vectorized methods). These will later be replaced by overloads, but so will some of the NumPy ufunc or Numba typing/lowering overloads—I don't know why these are singled out. But what copy_behaviors should do is determined by its use, and I haven't seen how it's intended to be used.

I see that you fixed the rapidjson directory, so this is ready to merge, if my confusion above is unfounded. Otherwise, you can remove the check for "*" not in key and add a special case for key == existing_class.__name__.

jpivarski

I'm merging this first version of copy_behaviors now, but as I said above, I think it should do a different thing (not skip the keys it's skipping). That can come later.

Saransh-cpp · 2024-06-30T23:06:50Z

Hi @jpivarski, thanks for the review and fixing the PR! Sorry for taking too long to respond here. The MWE below shows why I added the if condition in this PR -

import awkward as ak
import numpy

class SuperVector:
    def add(self, other):
        return ak.zip(
            {"x": self.x + other.x, "y": self.y + other.y},
            with_name="VectorTwoD",
            behavior=self.behavior,
        )

@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
print(ak.behavior)

@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
    pass

print(ak.behavior) 
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>
# 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>

ak.behavior.update(
    ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior)
)
print(ak.behavior) 
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDArray'> -> removing the star condition (the class is wrong)
# ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'> -> removing the string condition

Additionally, replacing the generic string condition with a more specific case -

if oldname != key:

leads to -

{'VectorTwoD': <class '__main__.VectorTwoDRecord'>, ('*', 'VectorTwoD'): <class '__main__.VectorTwoDArray'>, (<ufunc 'add'>, 'VectorTwoD', 'VectorTwoD'): <function <lambda> at 0x1050a7ba0>, 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>, ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>, (<ufunc 'add'>, 'VectorTwoDAgain', 'VectorTwoDAgain'): <function <lambda> at 0x1050a7ba0>, ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'>}

Could you please let me know what should be done here? I tested the current implementation with Coffea and everything works as expected. (I pointed out that there is a bug in copy_behaviors on Slack but that was a false alarm.)

jpivarski · 2024-07-01T01:42:39Z

('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n')

You definitely want to check for the isinstance(key, str) case and not iterate over the string as though it were a tuple.

I was assuming that you'd copy all of the behaviors, including the "VectorTwoD" and the ("*", "VectorTwoD") ones, and then replace the ones that are different for the new class, which may be NumPy overloads, Numba overloads, or whole class overloads. That's why I was thinking you'd want this function:

def copy_behaviors(existing_class: typing.Any, new_class: typing.Any, behavior: dict):
    output = {}

    oldname = existing_class.__name__
    newname = new_class.__name__

    for key, value in behavior.items():
        if isinstance(key, str):
            if key == oldname:
                output[newname] = value
        else:
            if oldname in key:
                new_tuple = tuple(newname if k == oldname else k for k in key)
                output[new_tuple] = value

    return output

and then define the new class after having copied the behaviors:

class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior))

class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

But now I see the problem. You were using the @mixin_class decorator to add both Array and Record forms of the class to ak.behavior, which is equivalent to assigning them directly into the ak.behavior dict, as I did above, but since it's a decorator, it has to be assigned exactly when the class gets defined. The current API for copy_behaviors takes the class object as an argument, so it has to be called after the class gets defined. That's why you can't copy all behaviors and then overwrite some of them: there are causality constraints on when they can be defined and when copy_behaviors can be called.

This is going to be a problem because the "VectorTwoD" and the ("*", "VectorTwoD") keys are not the only ones that get assigned by decorators: keys like (numpy.add, "VectorTwoD", "VectorTwoD") can also be defined by @ak.mixin_class_method, which has to be called at the time when the class is defined. (See ak.mixin_class_method and Mixin decorators.) If the developer writes a class using this instead of assigning a lambda expression, they'd be in just as much trouble as my example above.

The copy_behaviors function needs to not take the class object as an argument, and it should (as a best practice) be called before the class gets defined. In fact, the function doesn't even use the class objects, just their names; it should only ask for what it's going to need.

This should be enough:

def copy_behaviors(from_name: str, to_name: str, behavior: dict):
    output = {}

    for key, value in behavior.items():
        if isinstance(key, str):
            if key == from_name:
                output[to_name] = value
        else:
            if from_name in key:
                new_tuple = tuple(to_name if k == from_name else k for k in key)
                output[new_tuple] = value

    return output

which would then be used like this:

class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))

class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

or like this:

@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
    def __eq__(self, other):
        return ak.all(self.x == other.x) and ak.all(self.y == other.y)

ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)

ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))

@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
    pass

ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2)

The general strategy, for a developer who is using the copy_behaviors function, is to (1) copy everything and (2) replace what's different about this new class. The copy_behaviors API needs to be flexible enough to let them do that. Since some ways of defining behaviors are bound to a class decorator, constraining when they can be assigned, the copy_behaviors has to not depend on the class object, or else there will be a circular dependency.

Saransh-cpp · 2024-07-04T15:58:01Z

Thank you for the detailed explanation! I went through it and have created a PR with the fix.

nsmith- · 2024-07-09T13:52:20Z

How should I interpret the fact this is in the _util module? Is it private? Is it ok to use in coffea?

Saransh-cpp · 2024-07-10T12:00:18Z

copy_behaviors is definitely intended to be a public API. Maybe I should move the function to another module?

Saransh-cpp marked this pull request as draft May 30, 2024 20:01

Saransh-cpp temporarily deployed to docs May 30, 2024 20:15 — with GitHub Actions Inactive

Saransh-cpp temporarily deployed to docs May 30, 2024 20:37 — with GitHub Actions Inactive

Saransh-cpp mentioned this pull request May 30, 2024

refactor!: back nanoevents.methods.vector with scikit-hep vector CoffeaTeam/coffea#991

Merged

feat: copy_behaviors to make sub-classing easy

4e790e4

Saransh-cpp force-pushed the easy-subclassing branch from 0dc14c1 to a60f913 Compare June 7, 2024 11:46

Saransh-cpp temporarily deployed to docs June 7, 2024 11:56 — with GitHub Actions Inactive

Saransh-cpp marked this pull request as ready for review June 7, 2024 12:09

Saransh-cpp requested a review from jpivarski June 20, 2024 13:30

jpivarski requested changes Jun 20, 2024

View reviewed changes

src/awkward/_util.py Outdated Show resolved Hide resolved

pylint errors and tests

3113246

Saransh-cpp force-pushed the easy-subclassing branch from a60f913 to 3113246 Compare June 24, 2024 10:00

Saransh-cpp temporarily deployed to docs June 24, 2024 10:08 — with GitHub Actions Inactive

jpivarski and others added 2 commits June 25, 2024 11:58

make 'copy_behaviors' safer by making it immutable

02cc894

Merge branch 'main' into easy-subclassing

dd9ab7b

jpivarski approved these changes Jun 25, 2024

View reviewed changes

jpivarski deployed to docs June 25, 2024 17:11 — with GitHub Actions View deployment

jpivarski approved these changes Jun 25, 2024

View reviewed changes

jpivarski approved these changes Jun 26, 2024

View reviewed changes

jpivarski merged commit b458523 into main Jun 26, 2024
39 checks passed

jpivarski deleted the easy-subclassing branch June 26, 2024 16:58

Saransh-cpp mentioned this pull request Jul 19, 2024

fix: refine the implementation of copy_behaviors #3177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: copy_behaviors to make sub-classing easy #3137

feat: copy_behaviors to make sub-classing easy #3137

Saransh-cpp commented May 30, 2024 •

edited

Loading

lgray commented Jun 18, 2024

Saransh-cpp commented Jun 20, 2024

jpivarski left a comment

jpivarski left a comment

jpivarski left a comment

Saransh-cpp commented Jun 30, 2024

jpivarski commented Jul 1, 2024

Saransh-cpp commented Jul 4, 2024

nsmith- commented Jul 9, 2024

Saransh-cpp commented Jul 10, 2024

feat: copy_behaviors to make sub-classing easy #3137

feat: copy_behaviors to make sub-classing easy #3137

Conversation

Saransh-cpp commented May 30, 2024 • edited Loading

lgray commented Jun 18, 2024

Saransh-cpp commented Jun 20, 2024

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

Saransh-cpp commented Jun 30, 2024

jpivarski commented Jul 1, 2024

Saransh-cpp commented Jul 4, 2024

nsmith- commented Jul 9, 2024

Saransh-cpp commented Jul 10, 2024

Saransh-cpp commented May 30, 2024 •

edited

Loading